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Abstract 

This report presents the results of a two-year nationwide field experiment designed to 
provide credible evidence about the effects of using computers in math instruction in grades five 
through eight Ninety-six classes (48 pairs of "computer" and "traditional" classes) taught by 56 
teachers in 31 schools from 25 districts in 16 states participated in the first year of the study; 
eleven teachers from nine schools employed the same "teacher-control" design through the 
second year of the experiment The overall effect sizes found at the end of the first year on five 
measures of math achievement, although generally above zero for the methodologically superior 
implementations, were not substantially above zero for the study population as a whole, except 
for the estimations subtest. 
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Since the mid- 1 980' s, computers in elementary and middle grade schools have functioned to a 
large extent as a medium for student practice in the skills and concepts of basic mathematics and logic. 
Nearly one-half of American elementary and middle-grades students use computers in their school 
mathematics activities (Martinez and Mead, 1988), and every week perhaps three to four million 
youngsters are engaged in answering math problems posed to them on computer screens. Yet, in spite 
of this widespread teaching practice, we really do not know (a) whether and (b) under what conditions 
students have learned more by practicing math and logic skills with the computer programs that their 
teachers provided than if their math lessons had been totally without computer use or had they been 
exposed to using computers in mathematics in very different ways. 

Although it is true that there is a growing research literature about the use of computers in 
mathematics, I would maintain that that literature is inadequate to questions about the effectiveness of 
typical practice. In "computers in basic mathematics," as in other areas of computer-assisted-instruction, 
three types of research dominate: 



Informal, formative assessments of learning, usually in conjunction with software 
development Studies accompanying development projects often incorporate very sensitive 
measures of learning and performance and are valuable for informing and improving the software 
and curriculum development that they accompany. However, the questions asked by this type 
of research tend to be of the nature, "Can computer-based activities enable students to gain 
certain competencies or improve their capacities?" rather than the question, "Do such activities 
typically result in those improvements?" These development projects often involve innovative 
ways of using computers and incorporate theoretically rich ideas about what students need to do 
to truly understand concepts and relationships. But they typically proceed under the assumption 
that students will ngi attain these competencies by traditional instruction, and therefore they do 
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not collect data comparing achievement by students experiencing the computer-based approach 
with learning by other students experiencing other teaching approaches, or by the same students 
during other time intervals. Moreover, the careful site selection and extentive monitoring that 
often occurs during development phases of instructional programs makes it unclear whether the 
program might be effective under more typical conditions. 

Large-scale (sometimes "system-wide") evaluations of a recently implemented computer- 
based instructional program, usually involving basic skills drill-and-practice. Most 
evaluations of system-wide computer-based interventions are planned af£r the implementation 
of the program As a result, these studies are typically left in a weak position with respect to the 
inferences that they are trying to make. Some of these evaluations arc made based only on 
changes in standardized test score percentiles "pre-" and "post-" treatment for the same children. 
Particularly useless are non-control-group studies that use fall and spring test score comparisons 
because there is substantial evidence of inadequate norming by test publishers resulting in typical 
fall-to-spring gains of 8 NCE points regardless of the students' instructional experiences (Gabriel, 
et al, 1985). Other system-wide evaluations examine differences in achievement between 
successive grade-cohorts (the "pre-implementation" cohort and the "post-implementation" cohort) 
or between schools participating in the program and "comparable" schools not participating. 
These designs rule out some threats to inappropriate conclusions but most of these studies 
inadequately measure other factors that might also account for differences in student outcomes 
between the schools or school-years in which the computer approach is used versus those where 
it is not used-differences in teacher quality, clientele, socio-ernotional climate, test preparation 
activities, and curricular program Selective publicity to the most favorable statistics, grade 
levels, and implementations also results from our reliance on vendor-reported data as well as 
differential reporting of the more favorable vs. the less favorable district-produced evaluations. 



« 



The third type of research data common in the computer-based-instruction literature consists of 
small-scale comparison studies, often involving only a handful of teachers in one school, 
often encompassing just several days or weeks of "treatment," and more often than not 
lacking in strict experimental control. Like the formative studies and the system-wide post-hoc 
evaluations, many small-scale comparison studies do not employ random assignment to treatments 
so that teachers using the computer-based approach may not only be different individuals than 
the teachers of comparison classes but possibly different in their teaching effectiveness. Students 
enrolled in experimental and control classes may be assigned non-randomly and be different in 
relevant ways (e.g., starting competence). And sometimes (and this is often difficult to know 
from the sketchy information provided in published reports) experimental and control classes 
spend unequal amounts of time on the subject-matter tested (i.e., the experiment is an "add-on" 
activity). In spite of their frequent design deficiencies, these studies purport to draw conclusions 
about differences in student learning outcomes that result from the computer-based approach 
studied. 



The value of these small-scale studies depends largely on the strength of their internal validity, 
but even the best designed studies have provided evidence that computer-based interventions can work, 
but for two major reasons they have not told us what works when and whether these approaches 
generally do work. 



Generalizability. Classroom environments in which computer software is used differ sharply 
from one to another. Far too often people improperly infer that results of a study done with a 
few teachers in a narrow range of circumstances are predictive of the kinds of results that will 
occur under different and possibly less favorable circumstances. Teachers differ in how well they 
can implement new approaches to their subject-matter. Similarly, classrooms vary in the amount 
of support and assistance that students require for attending to learning tasks and actually 
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learning. And the computer experiences themselves vary from one setting to another, even when 
the same type of computer software is in use and certainly when different software is used. 
When well-designed but small-scale studies obtain different conclusions about the effectiveness 
of the computer-based approach that they studied-as they almost always do-these variations are 
partly due to the random variation that occurs among small samples of students and teachers and 
partly due to the systematic variation that exists between one circumstance and another. Even 
the best-designed small studies suffer from both an inability to identify with confidence outcomes 
that have practical significance (because results that are important for policy purposes may still 
not be statistically significant) and an inability to identify the range of circumstances within 
which benefits are likely to occur. 

Germaneness. Many of these studies are reports of short-term interventions. Studies of short 
duration may indicate the value of using one program for teaching one concept or skill, but do 
not address the question of whether current school investments in using computers more broadly 
for instruction in one or more academic subjects are paying off. It may be that, in a particular 
subject domain, only topic-specific, occasional uses of computers are worthwhile. Although it 
is important to know which specific uses actually work and are better than existing alternatives, 
schools also need to know about the value of longer term, more integral uses of computer 
software which are more likely to justify and support major investments of computer hardware 
than the occasional, specialized use. 



Many mathematics educators have argued that, in elementary and middle-grade mathematics, the 
germaneness issue is not answered merely by studying long-term (e.g., full year) interventions rather than 
isolated single-topic units. They argue that the mathematics curriculum itself is hardly germane to the 
most important numerical and symbol manipulation competencies that adults should have-for example, 
that instruction is overburdened with teaching manual algorithmic skills for which students now have 



other tools (e.g., hand calculators). However, regardless of the merits of those curriculum arguments, 
school administrators who are answerable to many interests and publics that do not accept such 
arguments need to know whether their use of computers to support instruction in their existing 
mathematics curriculum is a more effective a way of using their computer resources than other ways 
that students ami teachers might use school computers. Therefore, it is important that researchers sdlress 
the general question of "what effects are 'typical' computer-based instructional programs in mathematics 
having on students and under what circumstances," even when answering that question does not directly 
tell us about the potential value of using computers in other ways that might support other cunicular 
goals. 



In any event, to provide valid conclusions about the consequences of using computer approaches 
for any. given curricular goal, it is essential that such research have high internal validity, incorporating 
comparisons with alternative instructional approaches, control over "teacher effects," and randomized 
designs. And to understand the limits of applicability of those conclusions, it is important that the 
research be conducted over a large and representative domain of settings. 

Meta- Analyses. One approach to providing data about the mean and range of effects of using computers 
has been to undertake secondary re-analyses of existing small-scale research. Often these studies are 
in the form of a meta-analysis, imposing a common statistical metric on diversely reported research and 
being as inclusive as possible in an effort to narrow the confidence limits of the estimated mean effect. 
The most recent meta-analysis of computer-based instructional research, the first to really focus on 
micrpcomputtr-baserf interventions, was done by Roblyer, Castine, and King (1988). These researchers 
examined roughly 200 studies published or reported between 1980 and 1988 and included 82 of those 
studies, about one-half dissertations and one-half published articles, in their meta-analysis. (Studies were 
excluded that failed to meet certain minimal requirements such as the presence of a comparison group 
and sufficient statistical data to be able to report effects in a common effect-size metric.) 
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Many of the 82 studies inducted in the meta-analysis have limited value for understanding the 
effects of computer-based approaches to instruction. Studies of a single pair of classes were particularly 
common, and others were studies of brief durations and limited computer treatment One-third of the 
studies in this meta-analysis were based on fewer than 20 students per treatment (if treatments were 
taught by a single teacher) or fewer than 35 students per treatment (if experimental and control teachers 
were different). About 20% of the studies involved a treatment period of fewer than 8 weeks and fewer 
than 10 hours total instructional time. 



Moreover, the great diversity of subject-matter, grade levels, and types of software included in 
this aggregation of studies limits the utility of a concept of "mean effect size.** Just as one can study 
too narrow a range of implementations to be able to make valid generalizations across different school 
settings, one can mix too much diversity into a single pot For the purpose of providing policy-relevant 
data for schools and school districts that are considering alternative ways to use computers, it seems 
wrong to combine studies of math C.A.I, in elementary school, writing actitivies in high school, and 
logic and problem-solving puzzles in the middle grades. 

In Robylyer, Castine, and King, the number of longer-term studies of at least several classes of 
students in any one age range, for any one subject-matter, and using any one type of software is quite 
small. For example, 26 studies covered students mainly in grades 5 through 8. Of those, 17 passed the 
minimal size and duration criteria discussed in the previous paragraph. But among those 17 studies were 
8 studies of mathematics using C.A.I., 4 studies of problem-solving using Logo or various logic practice 
puzzles, 6 studies of reading C.A.I., and 5 studies of language arts C.A.L 



-matter 



Requirements for internal validity further reduce the number of studies of any one subject- 
substantially. Of the eight studies of middle grade mathematics, for example, only one (a small study 
of two classes per treatment) involved random assignment of students to computer and non-computer 
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classes, only that study and one other used a design where the same teacher taught both the "computer" 
and "non-computer" classes, ami only two other studies may have partially compensated for non-random 
assignment and teacher differences between treatments by being very large (e.g., 300 students per 
treatment or more). 

And each of these studies was conducted under different conditions, collecting different data 
about student achievement and implementation characteristics, by different researchers. 



Field Experiment at a Distance. Another approach to developing an understanding about the 
effectiveness of typical current practices of using computer-based instruction under a modestly limited 
but $o|l representative set of conditions is to conduct simultaneous experiments in a great many schools, 
employing a high-quality research design, and collecting identical data from site to site on many of the 
conditional variables that might affect the experiment's outcomes. Of course, research designs involving 
randomized assignment are difficult to implement because many other conflicting considerations affect 
the willingness and ability of schools to adapt to researchers' needs and preferences. For example, many 
schools with existing computer-based programs feel obliged to give all students access to their program. 
However, not every school finds the research requirements for experimental designs impossible to meet. 
Many teachers and administrators, for one reason or another, are willing to make necessary adjustments 
in their prior scheduling and assignment practices to allow high-quality research designs to occur. The 
trick is to identify those research-amenable schools and develop a model for conducting research that 
permits their involvement, whereever they might be located. 

For this research project, our goal was to recruit from the fens of thousands of schools and 
hundreds of thousands of classrooms around the country about 50 pairs of classes to participate in the 
first of several planned "National Field Studies of Instructional Uses of School Computers." This study 



was to focus on mathematics instruction in grades 5 through 8 in both elementary and nwkile/junior-high 
school environments. Participants had to meet several requirements: sufficient computer hardware so 
that a computer-using mathematics class could have regular access to at least one computer for every 
four students in the class; sufficient computer software for students to receive a substantial computer 
-treatment** during the year (e.g., 30 hours of computer time per child); computer knowledgeable teachers 
with two years of computer-use experience; an alternative, "traditional" instructional approach that would 
have to be applied in one-half of the participating classes and would have to address the same 
curriculum; and some form of randomized or matched assignment of students to participating pain of 
classes to insure equal ability classes-classes that would, in turn, be randomly assigned to the 
"computer" or "traditional" treatment Where teachers were responsible for several sections of same 
grade-same level mathematics, each teacher could act as his or her own control; where teachers taught 
self-contained classes, two participating (and thus computer-knowledgeable) teachers would be randomly 
assigned to a treatment Students in the "traditional" classes were free to use computers for activities 
outside of mathematics. But in mathematics, the distinction between the computer and traditional 
approaches was to be maintained from September, 1987 to May, 1988. The selection of computer 
software and the fit between software and the curriculum was left up to each school. This was to be 
a study of the effects of computer use as actually practiced across the United States under reasonably 
rich but not atypical circumstances during the 1987-88 school year. 

The researcher's role-accomplished largely by telephone and mail communication-was to obtain 
the involvement of the best combination of schools and teachers to study the question; mandate the 
experimental design, assuring the appropriate mixture of forced equivalence and randomization in student 
and teacher assignments; provide appropriate pre- and posttests; and collect data from teachers and 
students about their background and about the details of their mathematics class experience during the 
year. 
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We recruited schools primarily through announcements in education publications and direct mail 
requests sent to research directors, other administrators in school systems, a representative national 
sample of school-level administrators, ami 40 publishers of educational software. Sixty-eight school 
districts or individual schools indicated their willingness to participate and ability to conform to the 
research design, and 27 of those "applications" were accepted. Altogether 96 classes (48 pairs of 
"computer* and "traditional" classes), taught by 56 teachers in 31 schools from 25 districts in 16 states 
participated in the first year of the study. Each school had between 2 and 8 participating classes. The 
classes were distributed across grade five (26 classes), grade six (30), grade seven (24), and grade eight 
(16). Two pairs were in parochial schools; the remainder, public schools. 

The schools in the study include a nationwide span, but a slight majority were located on the East 
Coast. One was in New England. Three were in the New York metropolitan area (2 in Brooklyn). Five 
were in Pennsylvania or Maryland (two of those in Metropolitan areas). Two were in Greenville, N.C., 
a relatively poor community of under 50,000. Five schools were located in Honda, spread out among 
the Tampa area, the less wealthy northern part of the state, and Palm Beach. Two schools were in 
Kentucky (one of those on a military installation), and one was in Texas, near Dallas. Seven schools 
were in the Midwest, primarily in small communities or suburban areas and in the Iowa cities of Des- 
Moines and Davenport. The Pueblo, Colorado area had two participating schools, one school was near 
Portland, Oregon, and two schools from California were in the study, one in a small town north of Santa 
Barbara and one in suburban Los Angeles. 

After the conclusion of the first year, we encouraged schools that could employ the "same- 
teacher-contror design to participate in a second year of the experiment, and eleven teachers from nine 
schools agreed to do this. The second year of data includes five 5th grade, four 6th grade, and one 7th 
grade pain. Data from the other 7th grade pair of classes was not analyzed because of technical 
problems. 
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In selecting schools to participate from among those recruited, we had two basic criteria: (1) that 
the conditions of the study seemed like they could be fulfilled (i.e M that the experimental design could 
be implemented, that the teachers had computer experience, and that there was sufficient "good-quality" 
hardware and software available and access time to the equipment); and (2) that the overall study 
population had a representative balance in terms of geographic location, socio-economic context, grade 
level, and types of hardware and software in use. In addition, consideration was given to situations that 
could provide the strongest design (same-teacher control rather than randomly assigning teachers to 
treatments) and to proximity to the researcher's location. 



With only a three-month recruitment period and a goal of having roughly 50 pairs of classes 
across four grade levels, it was not possible to produce a set of sites that was satisfactory on all 
accounts. In particular, we were dissatisfied with the small number of large-city school districts we 
could incorporate (one) and with the relatively few schools and classes with many black students. 
Blacks numbered under 10% of the enrollment in 22 of the 31 schools. At five of the schools, blacks 
were at least 25% of the enrollment; but only one had a majority of black students. Other minority 
groups were better represented. Hispanics were a majority in two schools, and Asian-Americans 
constituted one-third of the population at another school. Altogether, minority groups were 20% or more 
of the enrollment at one-third of the schools in the study. 

Socio-economic representativeness was reasonably satisfactory. At only five schools was 
enrollment largely upper-middle class. At those schools, the principal reported that nearly all parents 
were professionals or white-collar workers with a majority of families earning more than $30,000 per 
year. But the other 26 schools constituted a more typical blue- and white-collar heterogeneous mixture. 
In none of the others were white-collar parents clearly in the majority and in all but one the estimated 
high-income proportion was under 50 percent Two kinds of communities, though, were poorly 
represented in the study-only two schools were in very low-income areas with many families receiving 
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financial assistance, and no schools were in rural areas enrolling mainly children from fanning families. 

The classes participating in the study constituted a broad mixture of student ability levels within 
each school's population. In year one, fifteen pairs were heterogeneous mixtures of their school's grade 
level population; 10 more represented the middle-range of students in the school, but not the upper- or 
lower-thirds; eight represented the school's "upper-third" (one including the middle as well); and fifteen 
represented mainly "below-average" or "bottom-third" students or a range including middle and bottom- 
thirds. (See Table 1.) Seventeen pairs were in elementary schools, 27 pairs in middle or junior-high 
schools; and four in K-8 schools. Sixth-grade classes were split among all three types. 

The one other aspect of the study population that fell short of our goals was in teacher experience 
in using computers for mathematics. Although most of the teachers recruited had previously used 
computers in some way (generally to do word processing or teach computer literacy), only about half 
of them had used computers in mathematics teaching and only one-quarter had done so on a regular 
(more than weekly) basis. Thus, one important characteristic of the study's first year is that it was an 
examination of student achievement gains during the implementation year of an instructional program. 
One major purpose of the second year's data collection was to determine whether effects were different 
in a setting that would be regarded by participants as being more routine. 

On the other hand, we exceeded our initial expectations in terms of recruiting schools able to 
establish the highest quality research designs (student-level randomization, same-teacher control) and 
having a very favorable computer-student ratio. The ideal design of randomized assignment of individual 
students to "computer" and "traditional" treatment classes was accomplished in about one-third (11) of 
the schools participating in the study. At all of the other sites, we came very close to approximating this 
ideal design. The most common design, for example, (at 14 schools) involved randomized assignment 
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of classes to treatments, with classroom student composition being determined locally but based on a 
systematically balanced or random assignment procedure, as certified by local school personnel. At two 
schools, adjacently ranked homogeneously grouped classes were randomly assigned to treatments. At 
the remaining four schools, because of limitations on the availability of computer facilities, the class 
assigned to the computer treatment was fixed. At two of those, students were randomly assigned to 
classes by the researcher, at the other two, local procedures were claimed to produce equal ability 
classes. 



In 73% of the pairs of classes in the study's first year, the same teacher taught both a 
"traditional" and a "computer" class categorized as "same ability" either by random assignment or by 
a local randomization or matching procedure. In most of the remaining pairs of classes, two eligible 
teachers ("computer knowledgeable") taught self-contained classes of same-ability students, one teacher 
(randomly selected) teaching mathematics using computers, the other teacher using traditional 
instructional media. In the (4) remaining pairs, teachers taught both "computer" and "traditional" classes 
but of different ability levels; formally, each teacher was paired twice with another teacher from the 
same school who taught two classes at the same level but using the opposite "treatment." In year two, 
all pairs involved the same teacher teaching matched-ability computer and traditional classes. 

(The superiority of same-teacher controls over random assignments to treatment between teacher 
volunteers is suggested by the fact that same-teacher pairs had effect sizes whose standard deviations 
were much smaller than those for "different teacher" pairs. That is, pairs where teacher effects were 
added to treatment effects produced much larger differences in mean achievement between "computer" 
and "traditional" class. The standard deviation of effect sizes for same-teacher pain was, on average, 
only 69% as large as for different-teacher pairs across the four achievement variables used.) 
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Except for one school computer-using classes in the study had the intended minimum of one 
computer available for every four students in the class. Many of the classes had substantially more 
computer resources available. The median number of computers simultaneously accessible to computer- 
using teachers in the study's first year was 16. The median class-size of computer-using classes was 
23. All but five classes had no worse than a 2 to 1 ratio of students to computers. During the second 
year, all classes had at least a 2 to 1 ratio or better. 



All participating schools used microcomputers rather than mini-computer systems. Only 8 
schools used computers on a local-area network, and only one school (2 pairs of classes each year) used 
an individualized, computer-managed, "integrated instructional system." At most sites, teachers or 
students had to load prograras into each computer's disks individually. Twenty-three of the 31 schools 
used software that ran on Apple H series computers; nine used software for I.B.M. and compatible 
computers. (Two schools used both types of machines.) One school used Texas Instruments cartridge 
software along with its Apples. And one school used networked Radio Shack TRS-80 hardware and 
software. 



The mathematics that was taught using computers and the emphasis on computation, concepts, 
applications, and general problem-solving skills varied from grade to grade, teacher to teacher, and site 
to site. Across the two years of the study, the most commonly used software products were the two 
series from Milliken Publishing CO. (Milliken Math and WordMath, together constituting approximately 
19% of all software used), MECC software (13%), IBM Math Concepts and Math Practice series (12%), 
Sunburst problem-solving software (7%), and the managed network-based system from E.S.C. (now 
Jostens Learning Corp., 6%). Other software that was used by three or more classes for a substantial 
portion of their computer time included the Mathematics Curriculum Project software from the 
University of Northern Iowa (U.N.I.), Scholastic Software ("Math Shop"), S.R.A. Math, Softwriters' 
Skills Bank, and Educational Activities software. The single diskette most widely used across the classes 
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in die study was MECC's "Number Munchers." Although most classes used primarily driU-and-practice 
programs, programs such as U.N.I.'s Math Curriculum Project and the Jostens ES.C. program had 
substantial tutorial (explanation and demonstration) functions. Problem-solving tasks built into programs 
such as Sunburst's Factory, Survival Math, and King's Rule; Scholastic's Math Shop; and the Math 
Activities Courseware series from Houghton-Mifflin were each part of the computer experience at several 
sites. 

Although nearly every "computer" class had students use computers at least once each week, 
variations among sites in the amount of time students used computers over the course of the school year 
were substantial. Some classes had only a single 30 minute period per week on the computer. One 
class, at the other extreme, used computers for an hour nearly every day during the school year. On the 
average, students spent 36 hours during the school year on computer-based mathematics activity. Figure 
1 presents a cumulative frequency distribution of computer time for individual students across the 48 
computer-using classes in the first year of the study. 

Since students spend roughly fifty minutes per day on mathematics, the computer-specific activity 
on the average amounted to only 25% to 30% of their mathematics experience. Still, the amount of 
computer time each student in these classes had is substantially more than students in most computer- 
using middle-grade mathematics classes in the country have. (Unpublished data from a national 1989 
survey conducted by the author indicate that a student in a typical computer-using secondary 
mathematics class uses computers for about 30 minutes per week, or under 20 hours per year.) 

Students in paired traditional classes spent the same amount of time learning mathematics as 
students in the ccmputer classes. In some of the pain, traditional classes used the time that their paired 
computer class had for computers on activities and lessons that the computer class did not have, such 
as group projects or problem-solving tasks. In other pairs, the traditional classes just spent more time 
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on the same a ss i gnmen ts (drills, lessons, seatwork) as the computer class. Overall, in exchange for their 
computer time, the computer classes spent an estimated 40% less time on small group activities (which 
was, however, a small proportion in either "computer" or "traditional" classes), 33% less time on class 
drill, and 25% less time on whole class lessons and seatwork than did the traditional classes. 

The data collected during the first year included national standardized mathematics achievement 
test data, both pre- and posttests; daily logs (20% sample of days) from teachers in both treatment groups 
concerning the class activities; the set of homework and classwork assignments and quizzes and tests 
given to students throughout the year (a copy of each class' textbook was also provided); questionnaire 
data from teachers and students at the beginning and end of the year, a brief experimenter-made brief 
posttest of estimation skills and mental mathematics; and 47 experimenter-made, curriculum- specific 
posttests, each one based on the instructional content in one particular pair of classes, developed from 
the assignment and test materials supplied by the teachers throughout the year and the computer 
programs used by those students. During the second year, the same pre- and posttests and the spring 
student and teacher questionnaires were fielded as in the first year, but the teachers were asked to give 
less detailed weekly reports about computer use patterns. 

During the first year, five schools (13 pairs of classes) were visited, but because the research sites 
were spread widely throughout the country, we relied on the periodic self-reports describee', in the 
previous paragraph to provide systematic data about how each site accomplished instruction in computer 
and traditional classes during the year. And because no researcher was on site, it was necessary to rely 
on teacher and student written feedback to validate that the experiment was actively implemented 

The first year's weekly reports provided information ncet ed to create curriculum-specific posttests 
for each pair of classes. In addition, the weekly reports and the spring questionnaires of students and 
teachers enabled us to code curricular and organizational properties of the computer and traditional 
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instruction classes in the absence of on-site observation by a research team. And this data allowed us 
to examine hypotheses about different aspects of instructional practice that might help account for 
differences between sites in the relative effectiveness of computer-based approaches. About 90% of the 
56 teachers participating during the first year sent back two reports each week for a 22- week period from 
November through April 

Between the teachers' weekly reports and the spring questionnaires completed by students- 
providing their own estimates of how often they used computers for different subjects during the school 
year-we determined that during the first year of the study, a number of sites were not successful in 
implementing a substantial computer experience and in maintaining a sharp distinction between computer 
and traditional treatments. Eight teachers* computer class students reported minimal computer experience 
during the year- 15 hours or fewer, by our estimate. One of those classes had fewer computers (6) than 
any other class in the study; in another case, an 8'lh grade math teacher only used a few pieces of 
software that he felt complemented what he was teaching. And in two of the others, cooperation by a 
pair of experienced math teachers was quite grudging in that their participation was imposed on them 
by their principal. 

In eight other instances, at least one-quarter of the students in the traditional treatment classes 
reported having used computers for math on more than five occasions. However, further inquiry 
indicated that most of those occurrences involved computer use after the oosttests when the teachers 
were free to break down the distinction between treatments. And in all of those cases where there was 
some contamination, the computer class students reported substantially more computer experience during 
the year than did the traditional class students. Altogether, the combination of relatively little computer 
use (15 or under hours during the year) and indication of computer use by the traditional math class by 
many of those same classes led us to drop eight pairs of first year classes from the data analysis. In the 
classes studied during the second year, there was substantial computer use in all classes and less 



evidence of contamination. So all 10 pain from year two were used in the data analysis, bringing the 
total number of pairs of classes studied to an even 50 (48 first year class-pairs minus 8 dropped plus 10 
second-year class-pairs). 

It should also be noted that adding the omitted eight pairs back into the analysis results in 
absolutely no change in the mean value of effect statistics calculated for the main contrast between 
experimental and control treatments. All five major -effect size" measurements (see below) are changed 
by a maximum of 0.01 units (posttest standard deviations) when the eight omitted pairs are included. 
Moreover, the effect sizes measured on the pairs of classes omitted from the analysis are more clustered 
around the zero point than were the effect sizes for the other pairs, suggesting that in fact there was less 
distinction between "experimental- and "control" classes for those pairs. (And the 20 pairs evidencing 
the most faithful implementation of the design-more than 30 hours of use; few reports of treatment 
contamination-had effect sizes that least clustered around zero.) 

Pretests and Posttests. Standardized tests. Although a common set of posttest measures was used at 
all sites during both years of the study, in the first year of the study, different pretests were used in 
different sites. For 54 of the 96 classes (in 19 of the 31 schools), students took the Stanford 
Achievement Test (math computation and math applications parts) in September, and the tests were 
scored by the researcher's staff. The remaining schools supplied the project with other pretest data-tests 
taken during the previous Spring or the Fall of the study year-using a variety of standardized tests 
(CTBS (3 schools); CAT (2); Iowa (1); SRA (3); Metropolitan (2); and Stanford (2)). Fall pretest scores 
were obtained from 4 schools, while scores from the previous Spring were used for students at 9 schools. 
In year two, all sites used the researcher-scored Stanford tests. Posttests each year included the math 
computation and applications sections of the Stanford. Math concepts were tested through the 
curriculum-specific test prepared for each pair (see below). 
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The fact that different schools supplied different pretest data docs not affect our ability id assess 
achievement gains made in computer-treatment classes compared to traditional-treatment classes. Each 
class pair's effect size is calculated based on their particular differential pretest-controlled posttest scores 
and therefore is independently measured from effect sizes in other schools. But it does impinge in two 
ways to limit analysis. First, one cannot do careful studies of absolute achievement gain except in 
schools using the Stanford pretest For example, only in those schools can we examine whether the 
effectiveness of the computer-based program is higher or lower for teachers whose traditional class 
gained more or less than the average teacher's. (That is, do the computer programs in use help the 
"better" teachers or the "not so successful" ones.) Also, absence of uniform pretest metrics limits our 
ability to assess the value of the computer-treatments for categories of students grouped according to 
previous academic perfonnance-the "high achieving," "average achieving," or "lower achieving" 
students. Since we have no easily obtained common standard on which to compare the prior academic 
achievements of students in different schools, these categorizations must be treated as roughly made 
divisions. 



quniculum-specific test. Besides the Stanford math computation and math applications posttests, three 
researcher-constructed posttests were used: a curriculum-specific test, a test of fluency in mental 
mathematics, and a test of estimation skills, the latter two combined in a single orally administered test 
Each curriculum-specific test was produced through an informal domain sampling procedure that 
included conceptual, computational, and problem-solving tasks contained in the teachers' textbook 
assignments, worksheets, tests, and computer program assignments. The produced test attempted to 
include a balance of problems given to each class in a pair but not the other, but it de-emphasized those 
skills already covered on the standardized achievement posttests. Thus the tests focused as much as 
possible on concepts and on applying math in real and complex situations, consistent with the need to 
cover only what that teacher actually taught and to balance the experience of the computer class and the 
traditional class in that pair. 
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This test was multiple choke-instead students were asked to supply their own answer, and 
to use the supplied test paper to do their calculations. Many questions had several parts or involved 
students supplying several answers (e.g., circling all fractions among a set of 16 that were greater than 
one-half). Related and multiple-decision questions were combined so that the test was scored as a set 
of between 15 and 22 -items'* per test Each item was allocated a number of points (most often "2" or 
"3 M ; sometimes "1" or H 4") based on an assessment of its complexity. Rules for partial credit were 
established for answers to questions with multiple parts, steps, or decision-points. The researcher scored 
all tests. 



The content of tests varied substantially from pair to pair. There were variations in attention to 
higher-order concepts and complex applications and in the degree to which computer programs formed 
the source of test items. (Appendix A contains a sample of several of the curriculum-specific tests.) 
On average, a test was composed of about 9 items from the computer programs used by the computer 
class in the pair, about 5 items from the traditional class' special activities (where they did have tasks 
that the computer class did not) and the remaining 4 items from tests or worksheets used by both classes 
in a pair. The imbalance between computer and traditional class sources is partly due to the fact that 
teachers of many pairs did not report any. assignments given only to the traditional class in the pair. In 
other cases, the content of all traditional-class-only assignments was already tested on the Stanford 
Achievement test Also, in some cases-where teachers emphasized mathematics computations or simple 
single-step word problems in both their traditional and computer classes or where the teacher did not 
consistently provide weekly data about assignments-we composed part of test with problem-solving 
tasks or concept items from other pairs. 

Teachers reported an "opportunity to learn" variable for each test item-that is, they indicated for 
each test item which of their classes (computer, traditional, both, or neither) had been presented with 
instruction for which that test item was appropriate. Excluding seven pain whose teachers did not 
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provide opjxmuirity-to-learn data for both classes, the teachers indicated that the content of 80% of the 
items had been covered in their classes. Chi average, the oppoitunity-to-lcam score was 5 percentage 
points higher for computer classes than for traditional classes, indicating a slight bias in test content 
favoring the computer treatment For pan of the analysis, differential exposure to the test content 
(between computer and traditional class of the same pair) and differential opportunity-to-learn were taken 
into account in analyzing observed effect sizes. 



During year two, instead of freshly deriving a lest based on the new year's instruction and 
materials, the same curriculum-specific posttest was given to the teacher's classes as was given in the 
first year. Opportunity-to-team measures for the second year were comparable to those during the first 
year, averaging 83% for the 8 teachers responding, but with the mean for traditional classes still below 
that for computer classes. 



The curriculum-specific tests were expected to be difficult-and they were. Even with partial 
credit scoring, students in classes studied during the first year averaged only 36 percent on this test Of 
course, the tests varied substantially in how well students could answer the questions: students in one- 
fourth of the pairs scored lower than 25%; students in 5 pairs scored over 50%. Overall, the low scores 
provide evidence that mathematical problem-solving requiring fluency in dealing with numbers and 
logical relationships is not successfully taught in most school classrooms. However, a discussion of the 
substantive mathematics education issues revealed by performance on these tests is reserved for a future 
paper. In terms of the tests' statistical properties, the item reliability of most tests was satisfactory or 
better. All but six tests had alpha reliabilities above .60. The mean reliability was .71 and the test with 
the highest reliability had an alpha value of .86. 

Mcn;a| Mfl^ffmiajipn »sfr T Two distinct rationales lay behind the other administered posttest. First, 
several of the computer programs used in many of these classes focused on rapid solution to basic math 
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facts. Many of these were presented in an arcade-game format It seemed appropriate to test students' 
ability to do rapid mental arithmetic, for example by presenting each problem for a fixed limited number 
of seconds. Secondly, competence in producing round-number estimates of answers, although not as 
much a part of the standard mathematics curriculum as mathematics educators recommend, also seemed 
appropriately measured by an orally administered test These two goals were combined in a common 
test given to all pairs. Teachers presented each problem visually on a "flip chart," each one for 10 
seconds plus 5 seconds between presentations. The test contained seven mental mathematics items and 
13 estimation tasks (and a practice problem for each part). Two versions of the mental math portion of 
the test were used-one for grades 5 and 6 and the other for grades 7 and 8. The same estimation items 
were used for all grade levels. Appendix B contains the items in the Grade 7-8 version of this test. 

The orally administered test was also a "fill-in," and in the estimation part, "double" credit was 
given for optimal estimates. Scores on the mental math subtest averaged about 30% and, with the 
double-credit scoring, the average estimation score was roughly 55%. Mental math and estimation 
subscales correlated .4 with each other on the individual level and among classes. But because mental 
computation and estimation are different skills, we treat them as separate dependent variables along with 
the Stanford computation test, the Stanford applications test, and the curriculum-specific test. 

Correlations among the five posttest raw scores are substantial, although the unreliability of the 
short mental math test produces attenuated statistics. Table 2 shows the mean student-level correlations 
among the 57 pairs of all posttest-posttest correlations. It also shows, in row one, the mean pretest-to- 
posttest intercorrelations, using as a pretest variable the simple sum of all pretests for that pair. All 
correlations among the pretest total, both Stanford posttests, and the curriculum-specific posttest average 
in the range of .49 to .61. Correlations with the mental math and estimation tests are all in the range 
.28 to .41. 



21 25 



Pretest Match, and Calculations of Achievement Gains and Effect Sizes. Randomized assignment 
of between 30 and 60 students to any one pair of classes does not assure that the paired classes are in 
fact equal in ability, not even when students are initially stratified by test scores, as were students in 
many of the schools in this study. Furthermore, the tests used to make assignments of students to classes 
were generally not the same tests as used for the sic test. So variations between "traditional" and 
"computer" class pretest means even for classes of randomly assigned students would not be unexpected. 
Moreover, only a minority of sites actually permitted researcher-accomplished randomization. The 
remainder used local manual or district-driven computer-based assignment procedures to produce "equal 
ability" classes. In fact, quite a few pairs of classes showed pretest differences. Overall, among the 58 
pairs over both years, 24 pairs had pretest mean differences of greater than one-quarter of a standard 
deviation; 1 1 of those exceeded one-half of a standard deviation. Moreover, the 24 class pairs 
randomized by the researcher were somewhat less likely to show large pretest differences (> 1.251 s.d.) 
than were the remaining pairs (38% vs. 45%). In addition, a greater number of large pretest differences 
favored the traditional class (i.e., indicated higher achievement levels there) than favored the computer 
class (15 vs. 9). So, for all of these reasons, it seemed particularly important to take pretest differences 
into account in computing effect sizes. 

Consequently, the performance gains accomplished by each student during the school year studied 
were measured by computing posttest raw scores for that student n& of their own pretest-indicated 
performance level. Separate regression equations were calculated for each of the five posttest 
measurements used-Stanford computation, Stanford applications, curriculum-specific, mental math, and 
estimation-and separate regressions were computed for class pairs receiving different pretests or the 
same pretest but at different grade levels. In year one, for example, the largest pretest group was formed 
by 10 pairs of 7th grade students pretested in the Fall with the Advanced version of the Stanford pretest 
Because there were so many combinations of pretests and grade levels in year one, distinct regression 
equations were calculated for 20 groups of classes. Most of those were calculated across students in 



o 22 
ERIC /^t> 



only one pair of classes (e.g M two 5'th grade students taking the CTBS). In year two, since only 
Stanford pretests were used, groups were formed merely by grade level In year one, all pretest 
subscales that existed for that particular pretest (eg., computation, concepts, and/or applications) were 
used as separate predictors for each posttest outcome measure. In year two, each Stanford sub-test was 
regressed only on the corresponding Stanford pretest Parameters of each equation were then applied 
to each student in the group's classes yielding a icsidual posttest score (actual posttest minus posttest 
score predicted from the regression equation). 

Students who did not have at least one pretest sub-scale were not included in the analysis. 
Students who were added to the class during the first part of the school year (through November) were 
included if pretest scores were available for them. Students who added later, who changed between 
"traditional" and "computer" class sections during the year, who left the class prior to the posttest, or 
who were absent from any one posttest were excluded from calculations of the effect size for that 
posttest, although they were included in descriptive statistics about the class. 

Altogether, pretest data were obtained for 2919 students (combining both years). Of those, at 
least one posttest was scored for 89%. Attrition for each specific posttest varied between 14% and 18%. 
(One teacher did not administer the mental math/estimation test and one teacher's curriculum-specific 
posttest was not scored because of clear evidence of test taking misbehavior.) 

For each pair of classes an effect size was calculated for each posttest by computing the 
difference in mean residuals between the computer and traditional class and dividing that difference by 
the pooled raw posttest standard deviation for both classes. 
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Major Results: Effect Sizes for the Study Population. Table 3 gives the mean and standard deviations 
of the effect sizes observed for each posttest outcome for the full 58 pairs of classes (actually 57 data 
points because one teacher taught two pairs of classes which were combined for the analysis). The table 
also provides mean effects far the 50 pairs judged to have implemented the study design satisfactorily, 
for the 20 pairs judged to have implemented the design mpjt faithfully, for the 9 pairs studied during 
their second year of using computers as part of the study (one second year teacher was a "traditional- 
teacher during year one), for the 24 pairs that incorporated researcher-controlled student-level 
randomization in their design, and for the 29 pairs whose pretest differences between the traditional and 
computer classes of the same pair were "minor" (under .25 s.d). 

For the study population as a whole, effect sizes for all five outcome variables were negligibly 
different from zero. For all 57 pairs, they ranged from -.02 to +.07. For the satisfactorily implementing 
pairs, they ranged from -.07 (mental math) to +.07 (estimation). However, for the 20 most faithful 
implementations, the effect size means were somewhat more positive (none was less than zero), ranging 
from +.03 to +.18, although only for the estimation outcome was ES > .10. 

Teachers in the second year of the study had more success, in terms of effect sizes, than did the 
pool of teachers in their first year of the study. However, the teachers continuing for a second year were 
not a representative sample of first year participants. And when we compare effect sizes in their second 
year classes with those of their own first year classes, the results for the second year clearly show a 
decline in effect sizes-not an improvement (See Table 3.) 

When we look at the group of sites where student-level randomization was accomplished under 
the researcher's control, we see a more consistently positive set of effect sizes. But, except for the 
estimation posttest, for none of the others was ES > .10. And when we examine only those pairs with 
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a close pretest match between traditional and com pute r classes, we see modest departures of effect sizes 
from those for the full study population, but in both positive and negative directions. 
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The final element in Table 3 reports the results of a multiple regression analysis of three of these 
methodological factors (all except "second year teachers") on student achievement effect sizes among 
the 50 pairs of classes that met our minimal criteria. The numbers shown are predicted effect sizes for 
a class-pair having the best of these implementation/design attributes: a faithful implementation (frequent 
computer use and no treatment confounding), researcher-controlled student-level randomization, and 
minor pretest differences. The predicted effect sizes for all achievement outcomes are above zero, and 
four of the five are near or above +.10. Still, only one predicted ES is above .20, which is a lower 
bound for what might be called a substantively important effect 

In summary, for the study population as a whole, even when we take into account that many sites 
had weak implementations or less than ideal study designs, the overall effect sizes, although generally 
above zero for the methodologically superior implementations, arc not substantially above zero, except 
for the estimations subtest, the outcome variable with the highest standard error. We postpone until later 
in this paper a discussion of the implications of these results, but one thing is certain-we cannot 
conclude from these results that "computers are a waste of money." First, the sample, although probably 
more representative of the range of actual practice than can be found elsewhere, is still not a 
sophisticated national probability sample of teachers and classrooms. Second, 5th through 8th grade 
mathematics is only one curricular application of computers. And third, there are other potentially 
valuable ways to improve students' understanding of mathematics through computers besides the 
typically diskette-based drffl-and-practice programs that constitute the most common approach employed 
with our study population. Still, on the average, for this population of teachers and students who used 
computers as they did, it seems as if-for the group considered as a whole-the use of computers did not 
make much difference for the students' performance on tests of mathematics skill and applications. 
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Differences in Effect Sixes: Are They Random? Although the mean effect size among these pairs of 
classes across five outcome variables was fairly close to zero, not all effect sizes were close to zero. 
One class pair had an effect size of over +1 .00 for four of the five outcome variables, and another class 
pair had an effect size below -.60 for three of the five variables. A third pair's effect sizes were +.62 
for Stanford computation, +.43 for the curriculum-specific test, and +.63 for the mental math test, but 
(negative) -.43 for Stanford applications. The standard deviation of effect sizes ranged from .35 for the 
Stanford subtests to .65 for the estimation test 



Effect sizes based on any one pair of classes are subject to a variety of situational effects 
independent of the actual effects of instructional experience. Moreover, even if one were to use 
3B&2Bly. produced test scores, effect sizes computed on the basis of a single pair of classes have a non- 
negligible chance of being greater than one-quarter standard deviation. (Using a monte carlo simulation, 
I computed effect sizes for this study population based on random test scores, pretest controls, and pre- 
and posttests correlated between .4 and .6 and found the typical standard deviation of effect sizes to be 
about .27.) Thus, effect sizes for single class pairs between, say, -.3 and +.3 are hardly meaningful, and 
even those in the range of 1.31 to 1.51 are not statistically significant. However, because the standard 
deviation of effect sizes observed was larger than that likely to be obtained by chance, it is plausible that 
there are some systematic patterns of effects-that the variations are not merely due to random 
fluctuations. By combining class pairs that are similar on some characteristic (for example, grade level 
type of software used, frequency of computer use, etc.) we can produce empirically based speculations 
about what factors make a difference in the effectiveness of the range of computer-based approaches to 
middle grade mathematics instruction employed by the schools in our study population. 



o 

ERIC 



26 

r - 



Further analyses with this data will examine seven types of variations for clues concerning the 
differential effects on student achievement of typically employed computer-based approaches to middle 
grade mathematics. Those seven categories are (1) school and community environments, (2) student 
characteristics (including grade level), (3) teacher characteristics, (4) the social organization of computer 
use, (5) curriculum coverage, (6) computer software, and (7) computer hardware and hardware 
organization. For each of these seven categories, there are at least several variables that are plausibly 
linked to possible variations in the effectiveness of computer-based instructional programs in 
mathematics, over the domain of practice that was studied. For example, take student characteristics. 
Do the computer-based approaches that teachers are now using work better with the younger students 
in grade 5 or the older ones in grades 7 and 8? What about student ability levels-are the computer- 
based approaches now in use better suited for students behind grade level in math achievement or for 
their on-grade or above-grade peers? Or take teacher characteristics. It is plausible that teachers 
responsible for teaching several subjects to a self-contained class may profit by using computer-assisted 
instruction more than math specialists. On the other hand, perhaps math specialists make better use of 
computer-based tools in the subject that they know best Again, we emphasize that the questions we 
address concern variations in effectiveness in terms of the range of practices actually studied. We cannot 
say what effects better-prepared teachers would produce or whether the effects of software more 
carefully developed to elicit mathematical understanding would be greater than the effects found for the 
software actually in use by the teachers in these 50 pairs of classes. 
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TABLE 1 

The 48 Pairs of Classes: Grade Levels and Ability Levels 



Class' Ability Level Grade Level 



(as reported by school) 


Ah 


6th 


7th 


8th 


MH m id mm mm a mm ~- 

Top 1/3 of their school's grade level 




3 


4 




Top and Middle Thuds 


1 








Middle Third 




5 


4 


1 


Heterogeneous 


11 


4 






Middle & Lower Thirds 


1 


1 




2 


Below Average 






1 


3 


Lower Third 




2 


3 


2 



TABLE 2 



Pre- and Post-Tests: Mean Individual-Level Correlations Among Class Pairs (N=57 class pairs) 



Pretest (sum of scores) 
Stanford Computation 
Stanford Applications 
Curriculum Specific 
Mental Math 
Estimation 



Stanford 
mtation Ai 



.56 



.61 
.54 



Curriculum- 

ions Specific Posttest Mental Math Estimation 



.55 


.37 


.39 


.49 


.38 


.34 


.53 


.28 


.37 




.36 


.41 






.39 
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TABLE 3 
Aggregate Effect Sixes 



Stanford 
uoniputttKm Ai 



Curriculum* 
Specific Posttcst 



Mental Math 



most-faithful inmlementation, 
igJeiic bn^a n do ii iiigd, minor 
prom diffawee between 



Estimation 





N* 


X 


s.d. 


X 


s.d. 


it 


s.d. 


X 


s.d. 


X 


s.d. 


All Pairs 

Pairs Kept in Study 

Most Faithful Implementations 


57 
50 
20 


+.04 
+.03 
+ 07 


.35 
.36 


+.04 
+.04 


.35 
.36 

AC 

.40 


-.00 
-.01 
+.04 


.43 
.46 
.56 


-.02 
-.07 
+.06 


.55 
.54 
.59 


+.07 
+.07 
+.18 


.65 
.68 

.77 


Teachers in Second Year 

Same Teachers in their First Year 


9 
o 


+.11 

+.31 


.32 
.34 


+.18 
+.13 


.42 
.40 


-.02 
+.23 


.56 
.36 


+.01 
+.11 


.32 
.57 


+.16 
+.32 


.41 
.90 


Researcher-Randomized at Student 
Level 


24 


+.06 


.35 


+.08 


.44 


+.08 


.41 


+.10 


.60 


+.26 


.75 


Only minor pretest differences 
between traditional & computer 
classes 


29 


+.03 


.38 


+.11 


.41 


+.02 


.46 


-.17 


.58 


+.08 


.77 


Regression output linear model 
predictor for the following 
condition: 


df=45 


+.09 


+.17 


+15 


+.01 


+.35 



for some posttests are 1 fewer because of cxxasionally missmgdata. 
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FIGURE 1 . Hours that any one student used computers. 

(N«48 classes, year one) 
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APPENDIX A: Selected Cumcuium-Specific B 



Johns Ho pkins 
P3 I Pollock 



National S chool Year Review 
Class ID: (Name: 



Th«rt are 98 pictures la Pain's photo 
album. There are 8 pictures on all of the 
pages except fin* one special paga that hai 
10 pictures instead of 8. How many pages 
are in the album, including the sp ecial 
page? 




2 If a baseball 



a A scorecard costs $1.25. Paula bought one 
and gave the sailer a $5 bill. How much 
change should she get? 




b The White Sox have played 25 games. The 

9 starting players together have made 225 

hits gofer. What is the gxexagg number of 

him that a White Sox starting player has 
had so far? 



ANSWER: 
hits 



3 Complete the table below by showing the 
average scores. 

Bowling Scores 





June 


Zach 


Game! 


168 


165 


Game 2 


142 


145 


Game 3 


154 


160 


Average 









4 Draw all line* of symmetry for each figure: 

a b 



\ 



5 Look at these figures. 



How many diagonals does a 6-sided fig- 
ure have? 



ANSWER: 



a 
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6' The square below can be rotated around ita 



Question: Can it be rotated so that it looks 
like these squares below. 

Circle Yes or No for each one. 
If Yes , tell how many degrees it 
should be rotated (clockwise). 



Yes — if ."Yes" How many 
No degrees 




Yes — ifTes* How many 
No degrees 



Yea — if "Yes" How many 
No 



7 a How many degrees? 





Well call these f 
inside degrees 



and this, > 
turning degrees 






b How many turning degrees in all? 

5^ 




\ 




° ^° W ^°y degrees on the iaajda of a tri> 




d How many turning degrees in a triangle? 





8 Round these decimals as specified 
a 5.9610 to the nearest 10th 
b 4.0019 to the nearest 100th 
e 5.595 to the nearest 100th 



9 Corn Plant 

Height in 
centimeters 



100 
90 
80 
70 
60 
SO 
40 
30 
20 
10 









/ — 











0123459789 10 
Weeks 

a During which weeks was there the most 
growth? 




b Approximately how tall will the com 
plant be after 9 weeks? 



cm. 



10 Complete these fractions to that thoy 
loss than* but at near to, the value of 1 
thoy can be. 



are I 14 Solve the*) fraction problems: 
. at I 



6 



9 



□ 



11 What size fraction is halfway between these 
fractions 



a halfway between j and j : 
b halfway between -|> and -| 



C halfway between «| andjg 



12 Write < , > or = for each pair effractions 
to make a correct statement 






•*o* 







13 Two points on this number line have been 
marked. What is the value of the point 
labelled "X" 



T 

7.5825 



T 
X 



T 

7.635 



ANSWER 



3 


4 


10 


20 


7 


3 


•J ♦ 


J 


10 


5 


12 " 


6 



3 
J 



1 



15 You are having 1 7 people over for a birth- 
day party. A 2-liter bottle of soda is enough 
for 3 people. How many bottles of soda do 
you need to buy? 



ANSWER 



16 



You have been hired to program a 
cmcken-scratch-mairingmachine. It can 
make two kinds of marks. 

It makes a j when you command it to 
CHIRP. I t makes a — — when you com- 
mand it to CHEEP. 

You can also program it to make a pat- 
tern of marks by giving a name to a set of 
co mmand s. For example, 

GOOX - (CHIRP CHEEP CHIRP). 

Tben when you command it to GOOX it 
would make | — J . 

You can use GOOX in future instruc- 
tions to the machine. The commands 
CHEEP GOOXCHEEP would make 

a Give a set of 3 commands that would make 



Well call that GLEEK. 

b What is the simplest set of commands 
(CHIRPs, CHEEPs, GOOXa, and 
GLEEKt) that would make: 

l-l-ll-ll 
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1 Multiply or divide 



a 3.266*100 



b .045 
z 2.1 



.03 ) 2.727 



Write each set of fractions in order from 
smallest to largest. 



_1_ 

2 



2. 
3 



_3 
5 



k JL J. JL 

° 4 5 10 




6 Add or subtract and write the answer in 
lowest terms. 



Jeff loves a ride at the amusement park 
called the Merry Backbreaker. The ride 
moves 26 meters per second and thrills 34 
people for half a minute. How many meters 
will the ride travel altogether? 



meters 



3 Circle all factors of 39 in the grid below. 



95 


52 


114 


65 


15 


27 


1 


47 


139 


27 


35 


26 


72 


13 


1 


12 


19 


139 


35 


18 


1 


52 


42 


63 


39 



4 Circle the prime numbers. 



1 


8 


5 


3 


11 


12 


16 


19 


21 


28 


4 


2 


7 


13 


31 


39 


18 


14 


15 


30 


36 


42 


9 


17 


22 


23 


25 


27 



0 
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1 + J. 
5 15 



9 + 6 



_8 
11 



l 

2 



d ± -J. 

16 24 



(4 ♦+>(*♦+) 



8 Express as a terminating or repeating 
decima L 



5_ 
6 



48 
200 




42 



9 Solve these proportions for 



48 
64 



n 
28 



3 
n 

30 
105 



n» 



big is the map? 



in. 



in. 



11 Express this fraction as a %. If necessary, 

round your answer to the nearest tenth of 
a percent 



23 
32 




13 



0.82 + 23.706 + 520 



124.52 - 38.081 



14 A runner completed the 200 meter race in 
22 seconds. 



a What was the speed in meters per 
ond, to the nearest tenth? 




10 In a map of a city drawn to scale, 1 inch rep- 
resents 2 \ miles. If the city's size is a 

3 1 
rectangle 13 ^ miles by 6 miles, how 



b What was the speed in kilometer per 
hour to the nearest tenth? 




IS 63% of 285 is represented by which for- 
mula. Fillin the square next to the cor- 
rect one. 



12 Arrange the numbers from least to greatest. 



285 * 63 - 222.00 


□ 


63 + 285 m 221.05 


□ 


.63x285 = 179.55 


□ 


.63 f 2.85 = .22105 


□ 


.63 x 2.85 a 1.7955 


□ 



16 Each of these magazines increased then 
circulation between 1980 and 1985 by 
20,000 copies. By what percent did each 
magazine increase its circulation? Com- 
plete the table. 



Magazines 



Gone Fishing 
Reading for Fun 
Teen World 



Circulation 



1980 


1985 


40,000 


60,000 


5,000 


25,000 


120,000 


140,000 



Percent 
Increase 



9 
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17 Acalfincreasedmweightfroml01b.to60Ib. 
What percent increase was that? 




IS A survey was taken residents of California 
to determine their views on the construction 
of a proposed highway. The results of the 
survey showed that 31% favored construc- 
tion, 60% opposed construction, and the rest 
had no ' * 



I j • } 9 II ( I ) | 



If only 18 people said they had no opinion, 
how many people in all were surveyed? 




19 These four sets of numbers follow the 
same kind of sequence, (hi any sequence, 
each number differs from the next in the 
same way.) 



• 2 

• 5 

• 1 

• 2 



8 

10 

6 

6 



32 
► 20 
36 
18 



These three sets follow a sequence, but not 
the same kind of sequence, as the ones above. 

• 4 7 10 

• 1 5 -> 10 

• 3 -» 6 -> 9 

Tell whether each of these follows the same 
kind of sequence as the first group. Circle 
Tee' or W for sach one. 



a 


3 


-* 6 


12 


Yes 


No 


b 


2 


_> 4 


20 


Yes 


No 


c 


4 


-> 12 


-+ 24 


Yes 


No 



Now write another set of numbers that 
follows this kind of sequence. 




20 A customer at the health food store where 
you work wants a mixture of oat and bran 
cereals ma certain ratio. You have to make 
to mixture for him using cereal premixed 
in certain other ratios. Show how many 
boxes of each premixed cereal would pro- 
duce the ordered mixture. 



Order 



order 
48 oz. oats 
72 oz. bran 



premix #1 
1 oz. oats 
7 oz. bran 

sr 



premix #2 
7oz.oats 
5 oz* bran 

v 




boxes of premix #1 
boxes of premix #2 



21 



a What is 250% of 600? 
b 5% of what number is 100? 



44 
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The manager of a gift shop is ordering 
sweatshirts. Each sweatshirt costs $7. She 
has budgeted $600. How many can she 

order? 



sweatshirts 



2 Company Profits 




Week Week Week Week Week Week 
1 2 3 4 5 6 

During which week did the company make 
the lflaat profit? 



week # 



During which week(fl) did profit decrease 
from the preceding week? 



weeks # 



3 mi/hr 
20 



15 
10 
5 



e 
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2 4 6 8 10 minutes 

Draw a graph to show the speed of a bicycle 
going along at 15 miJhr. for 5 minutes and 

than coasting to a stop over the next two 
minutes. 




29 31 



Which plant grew more rapidly during 
July? Circle one choice below. 



B 




PartGis — of the large triangle. What 
fraction of the large triangle is .... 



PartH? 

| 






The shaded part? 





6 



Who am I? 

I am a proper fraction. I am equivalent to 
-y • The sum of my numerator and denomi- 
nator is 33. 



Who am I? 

45 



9 



0 
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7 



4 T 

10 



8 Write >, <, or = in each circle to make 
correct statements. 



4 O T 

12 V, / 5 



Find the ™ fl q*ng values. 

□ 



□ 



means that the same 
number goes in both 
squares. 




12 



2 

12x 7 



10 What is the perimeter 
and what is the area? 

(The lengths of some sides must be calcu- 
lated first) 



4±ft. 
2 



1 

1 7* 



3 ft 



perimeter: 


ft. 


area: 


sq. ft. 



1 ft 



11 Check the boxes next to the facts that you 
need to solve the problem. Then solve. 

Problem: How many cans of paint does 
Kay need to buy? 



A can of paint will cover 3 -i walls 

- Each wall is 8 -j ft high. 

- Each room has 4 walls. 
• Kay must paint 4 rooms. 

- It takes Kay 40 minutes to paint each wall 

- Kay already has 1 ~|- cans of paiit. 




12 Solve if you have enough information. If 
not, tell that met is missing. 

One program is on TV for 88 minutes. Itis 
interrupted 8 times for commercials? What 
fraction of the time for the program is used 
for commercials? 




13 Use the numbers below to write propor- 
tions. 



a 3, 24, 6, 12 



b 8, 6, 12, 9 



3 — - — 



9, 2, 6, 



d 12, 15, 16, 20 — 




14 John has 4 red marbles, 2 blue marbles and 
3 black marbles in a bag. 



What is the probability (fraction) that the 
first marble he pulls out will be black? 



If it is black, what is the probability that the 
second marble will also be black? 



15 Write in order from least to greatest 
8.063, 80.002, 8.603, 80.01, 80.009 




16 On a drcus train, a lion requires 4.5 cubic 
meters of sand, a monkey r eq uir e s 10.2 
cubic meters. An elephant requires as much 
spaceasa monkey and Hon combined. How 
many elephants could be transported if 
there are 150 cubic meters of space avail- 
able for elephants. 



elephants 



I I XT I have a secret 4-digit decimal number. 

Guess my number. 



Here are some hints — guesses that were too 
higher Joaioxt 



Guesses that were 
Too High 


Guesses that were 
Too Low 


2.5(2)4 
1.742 
®.®8 2 


.04®3 
.0@75 
.0987 



Another hint: If a digit was correct and in 
the correct decimal position (tenths, hun- 
dredths, etc.), I put a circle around it. 

Guess my number (you can figure it out from 
the hints.) 



Your guess 



9 

ERIC 



47 



APPENDIX B: Mental Math/Estimation Posttests 
(Grade 7-8 Version) 



(Reduced to one-fourth normal sin) 



35 
+ 13 



2 



900 
X60 




25 
X15 
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96-5-16 



49 






What is the 
AVERAGE? 

24, 36, 45 



1395 
4795 
+ 1095 



o 
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5u 



420 


1 /\ 1 i 

$ 2.28 


370 


8.67 


280 


7.25 


130 


5.92 


+ 50 


+ 6.15 



! 



29 X 31 




303 



X 




309,386 

51 



13 



14 



9 1 46,000 



1214955 




3.1 X 4.98 



11.03 
X 0.51 
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4t +6 



9 



10 




What frac- 
tion of the 
circle? 



3 




How many 
degrees? 




What is 




30 



170 
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